Accurate identification of polyadenylation sites from 3′ end deep sequencing using a naïve Bayes classifier
نویسندگان
چکیده
MOTIVATION 3' end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3' ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences. RESULTS By analyzing sequence features flanking 3' ends derived from oligo-dT-based sequencing, we developed a naïve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites.
منابع مشابه
Accurate identification of polyadenylation sites from 30 end deep sequencing using a naı̈ve Bayes classifier
Motivation: 30 end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 30 ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic fil...
متن کاملApplication of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation
Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated r...
متن کاملPolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes
PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. ...
متن کاملGroundwater Potential Mapping using Index of Entropy and Naïve Bayes Models at Ardabil Plain
Although groundwater resources have long been selected as a safe choice for resolving human water requirements, overexploitation of them, especially at Ardabil plain, has promoted a decrease in the quality and quantity of these resources. One of the significant solutions is to identification of the groundwater potential zones and exploitation of them according to their potentials. The aim of th...
متن کاملImage Classification Using Naïve Bayes Classifier
An image classification scheme using Naïve Bayes Classifier is proposed in this paper. The proposed Naive Bayes Classifier-based image classifier can be considered as the maximum a posteriori decision rule. The Naïve Bayes Classifier can produce very accurate classification results with a minimum training time when compared to conventional supervised or unsupervised learning algorithms. Compreh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 29 20 شماره
صفحات -
تاریخ انتشار 2013